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Abstract — Multicasting is the general method of conveying the 
same information to multiple users over a broadcast channel. 
In this work, the Gaussian MIMO broadcast channel is con- 
sidered, with multiple users and any number of antennas at 
each node. A "closed loop" scenario is assumed, for which a 
practical capacity-achieving multicast scheme is constructed. In 
the proposed scheme, linear modulation is carried over time and 
space together, which allows to transform the problem into that of 
transmission over parallel scalar sub-channels, the gains of which 
are equal, except for a fraction of sub-channels that vanishes with 
the number of time slots used. Over these sub-channels, off-the- 
shelf fixed-rate AWGN codes can be used to approach capacity. 



I. Introduction 

A recurring theme in digital communications is the use 
of a standard "off-the-shelf" coding module in combination 
with appropriate linear pre/post processing which is tailored 
to the specific channel model. Such methods are appealing 
due to their low complexity of implementation as well as 
conceptually, since the task of coding and modulation are 
effectively decoupled. 

Underlying such decoupling schemes is the existence of a 
diagonalization transformation. For time-invariant scalar sys- 
tems, this is possible via the Fourier transform. The singular- 
value decomposition (SVD) plays a similar role for Gaussian 
multiple-input multiple-output (MIMO) channels. In recent 
years, as coding and decoding for single-user scalar channels 
has reached a mature stage, research effort has shifted to 
tackling the more ambitious goal of efficient multi-user (and 
multi-antenna) communication networks. 

Extension of the decoupling approach which is at the heart 
of single-user scalar systems to a multiple-user MIMO net- 
work requires, however, overcoming a major hurdle: (unitary) 
simultaneous diagonalization is in general not possibleQ 

Hence, different practical approaches were proposed over 
the years for the problem of multicasting over Gaussian MIMO 
broadcast channels. However, none of these approaches is ca- 
pacity achieving in general, even for simple cases. To illustrate 
this, consider the following simple three-user example: 



1,2,3, 



where Zi are white Gaussian noises with unit power (for each 
element), x is the channel vector subject to an average power 

* This work was supported in part by the ISF under Grant No. 1557/12. 

'Even if all matrices are diagonal, constructing a practical capacity- 
achieving scheme is hard, since using scalar coding over the resulting parallel 
channels is limited to working w.r.t. the minimal gain over each sub-channel. 



constraint P, and Hi are the complex-valued channel matrices 

#i=(o °a)' H2 = (P °)<#3 = (0 P )(1) 

where a and f3 are chosen such that the (individual) capacities 
of all channels are equal, viz. 



C* p2p = 21og(l + M 2 P/2) = log(l 



2 P) 



(2) 



This example models a "near-far" scenario in which the 
two "near" users have one antenna each, where each receives 
a different transmit antenna stream, whereas the "far" user is 
equipped with two antennas to compensate for the distance 
attenuation, such that the capacities of all channels are equal. 

A naive approach would be to use Vertical Bell-Laboratories 
Space-Time coding (V-BLAST) [1] or generalized decision 
feedback equalization (GDFE) [2|. This approach is based 
upon the QR decomposition, in which the output of the 
channel is multiplied by a unitary matrix, resulting in an 
effective triangular channel matrix. In the case of the example, 
however, repetition coding across the antennas must be used, 
to convey the (common) message to users 2 and 3. This, in 
turn, implies that the far user (user 1) cannot enjoy any of its 
multiplexing gain (which is equal to two in this case), due to 
the repetition that is carried across the two transmit antennas. 
Again, in the high SNR regime, this suggests a loss of 
approximately half of the optimal achievable rate. Moreover, 
for this example the "max-min beamforming" technique (see, 
e.g., (3 1 and references therein) reduces to this scheme as well, 
meaning that it suffers the same losses in performance. 

Another approach considered in the literature for this prob - 
lem is that of using a "pure open-loop" approach, namely 
Alamouti JH — for the two-transmit antenna case, and space- 
time coding [5 1 — for more. The performance of these 
schemes does not depend on the number of receivers. However, 
this universality comes at the price of a substantial rate loss 
for MIMO channels having several receive antennas, as these 
schemes use only a single stream, thus failing to achieve the 
multiplexing gain offered by the MIMO channel]! 

Finally note that time/frequency sharing suggest a great loss 
in performance (up to two thirds of the capacity in this case). 

In general, to the best of our knowledge known practical 
schemes are limited to the smallest number of degrees of 
freedom ("multiplexing gain") of the different users, or alter- 
natively incorporate time- or frequency-sharing, which again 

2 Moreover, for more than two transmit antennas, the space-time codes of 
| 5 1 attain strictly less than one degree of freedom. 



lose degrees of freedom. Thus, these schemes achieve only a 
fraction of the available degrees of freedom. 

In this work we develop a scheme that achieves the degrees 
of freedom of each individual channel, by enabling the trans- 
mission of several streams as is done in V-BLAST/GDFE in 
the point-to-point case. This is done by designing a special 
space-time coding structure that is tailored for the specific 
channel matrices, where the number of channel uses that is 
needed to be jointly processed depends on the number of 
users in the system. However, in contrast to the open-loop 
space-time coding structures, which strive for an "orthogonal 
design" structure (see, e.g., Q), the space-time structure 
presented in this work results in triangular forms, similar to 
V-BLAST/GDFE, but having equal diagonals, which suggests 
in turn, the optimality of the scheme. This gives rise to ef- 
fective parallel scalar additive white Gaussian noise (AWGN) 
channels, over which standard codes can be used to approach 
capacity. Thus, the proposed scheme could be thought of as 
an interpolation between the open-loop space-time coding 
technique and the point-to-point V-BLAST one. 

II. Channel Model 

The K-usei Gaussian MIMO broadcast channel consists of 
one transmit and K receive nodes, where each received signal 
is related to the transmitted signal through a MIMO link: 

y i = H i x + z i , i = l,...,K, 

where x is the channel input of dimensions n t x 1 subject 
to an average power constraint y t is the channel output 
vector of receiver i (i = 1, . . . , K) of dimensions n r ' X 1; Hi 
is the channel matrix to user i of dimensions rif 1 x n t and 
Zi is an additive circularly-symmetric Gaussian noise vector 
of dimensions rir x 1, where, without loss of generality, we 
assume that the noise elements are mutually independent and 
identically distributed with unit power. 

The aim of the transmitter is to multicast the same (com- 
mon) message to all the receivers. The capacity of this 
scenario is long known to equal the (worst-case) capacity of 
the compound channel (see, e.g., ||6]]), with the compound 
parameter being the channel matrix index: 

C ({Hi}f =1 , P) = max min I{H t ,C x ) , (3) 

where I(Hi,Cx) is the mutual information between the 
channel input x and the channel output y i7 obtained by taking 
x to be Gaussian with covariance matrix Cx'- 

I(H, C x ) 4 log dot (/ + HiCxHfj , 

and the maximization is carried over all admissible input 
covariance matrices Cx, satisfying the power constraint. 

3 Alternatively, one can consider an input covariance constraint E [cccct] ^< 
C, where by C\ ^ C2 we mean that (C*2 — Ci ) is positive semi-definite. 



III. Background 

In this section we recall the transmission and receiving 
scheme for the single- and two-user cases, and explain how 
this scheme can be generalized to the multi-user case. 

A. Unitary Matrix Triangularization 

The proposed scheme in this section is based on several 
forms of matrix decompositions, one of which is the geometric 
mean decomposition (GMD) (7). For simplicity, we will only 
consider the decomposition of square matrices throughout 
this work. As we show in the sequel, this does not pose 
any restriction on the communication problem addressed. The 
GMD of a square complex invertible matrix A is given by: 

A = UTV^ , 

where U, V are unitary matrices, and T is an upper-triangular 
matrix such that all its diagonal values equal to the geometric 
mean of the singular values of A, which is real and positive. 

Building on the GMD, the following decomposition, which 
will be referred to as Joint Equi-diagonal Triangularization 
(JET), was introduced in (8). Let A\ and A2 be two in- 
vertible complex matrices of dimensions n x n such that 
det(Ai)| = I det(A2)|. Then, the joint triangularization of A\ 
and A2 is given by: 

At = UiRiV^ 

(4) 

A 2 = U 2 R 2 V^ , 

where Ui,U2,V are n x n unitary matrices, and i?i,i?2 are 
upper-triangular n X n matrices with the same real-valued, 
non-negative diagonal values, namely, 

[fli]« = [Jk]« Vi = l,...,n. 

B. Point-to -Point MIMO Scheme via Matrix Triangularization 

We now review the transmission scheme known as the 
uniform channel decomposition (UCD) (9), which is in turn 
based upon the derivation of the MMSE version of Vertical 
Bell-Laboratories Space-Time coding (V-BLAST), see, e.g., 
iTTOl . Later in the paper we take the triangularization to be 
one which is simultaneously good for several users. 

Define the following augmented matrix^ 

a — ( HC x 2 \ 

\ In t )' 

where I nt is the n t x n t identity matrix. 

Next, the matrix G is transformed into a square matrix, by 
means of the QR decomposition: 

G = QG, 

4 C^/ 2 is any matrix B satisfying: BB^ = Cx, and can be found, e.g., 
via the Cholesky decomposition. 



where Q is an (n r +n t ) x n t matrix with orthonormal columns 
and G is an n t x n t upper-triangular matrix with real-valued 
positive diagonal elements. Now the matrix G is decomposed 
according to the GMD: 

G = UTV^ , 

where T is upper-triangular whose diagonal values are equal 
to '^/det (G), and "•{/ det (G) — 1 is the effective signal-to- 
noise ratio of the scalar sub-channels.. 
The transmission scheme is as follows: 

1) Construct n t codewords of equal rates for a scalar AWGN 
channel of signal-to-noise ratio (SNR) n \J det (G) — 1. 

2) In each channel use, an n t -length vector x is formed 
using one sample from each codebook. The transmitted 
vector x is then obtained using the following precoder: 

x = Cx /2 Vx . 

3) The receiver calculates 

where Q consists of the first nt rows of Q. 

4) Finally, the codebooks are decoded using successive in- 
terference cancellation, starting from the n t -th codeword 
and ending with the first one: The n t -th codeword is de- 
coded first, using the n t -th element of y, treating the other 
codewords as AWGN. The effect of the n t -th element of 
x is then subtracted out from the remaining elements of 
y. Next, the (n t — l)-th codeword is decoded, using the 
(n t — l)-th element of y — and so forth. 

The optimality of this scheme, i.e., that it is capacity achieving, 
was proved in J8] Sec. IV]. 

IV. MIMO Multicast Scheme 



The scheme of Section IIII-BI can be generalized to the K- 
user case in a straightforward manner]! However, in order to 
approach the capacity (01, using the same scalar codebook 
over all scalar sub-channel, as in Section IIII-BI the existence 
of a joint unitary matrix decomposition of the form 



1. 



,A. 



is required, assuming Ai are square invertible matrices with 
equal determinants (up to phase), of dimensions n x n, where 
Ui are unitary matrices (corresponding to operations per- 
formed at the receivers), V is unitary as well (corresponding to 
an operation performed by the transmitter) and T, are upper- 
triangular matrices with constant diagonals. Unfortunately, 
such a decomposition does not exist in general for more than 
one matrix since there are not enough degrees of freedom 
offered by the unitary matrices, as the unitary matrix on the 
right, V, is the same for all decomposed matrices {Ai} (corre- 
sponding to the common operation carried at the transmitter); 
for more details see ffTTI . To overcome this problem, in order to 

5 A similar scheme for the two-user MIMO multicast case was proposed in 
(U, where JET g} was used, implying the need of using scalar codebooks of 
different rates. 



gain more degrees of freedom, we propose to utilize multiple 
channel uses of the same channel realization and process 
them together. The idea of mixing the same symbols between 
multiple channel uses has much in common with space-time 
codes ||4), 0. In the next section we show how, using this 
idea, nearly optimal joint triangularization may be obtained. 

V. Space-Time Triangularization 

We now show how to utilize a space-time structure in order 
to obtain nearly-optimal joint triangularization of K matrices, 
such that the resulting triangular matrices have constant diag- 
onals, up to a small portion of the diagonal extreme elements. 
The resulting scheme becomes asymptotically optimal for 
large values of N, where N is the number of channel uses 
grouped together for the purpose of joint decomposition. This 
result is stated in the following theorem. 

Theorem 1 (Nearly-Optimal K-GMD): Let Ax, ... , A K be 
complex-valued n x n matrices satisfying | det(Aj)| = 1, and 
N > n K ~ x . Define the following nNxnN extended matrices: 



Ai — 



/ A, 
Ai 



V 



\ 




Ai J 



i = l,...,K. 



Then there exist matrices Ux, ... ,11k, V, a U of dimensions 
nN x n(N — (n K ~ 1 — 1)), with orthonormal columns, such 



that: 



WAiV = 



t 1 





V o 



* * \ 

* * 

1 * 

1 / 



i = l,...,K, 



where * represents some value (which may differ within each 
matrix as well as between different ones). 

By using this decomposition, the same scheme as in Sec- 
tion HII] can be employed, such that the N channel uses are 
effectively transformed into n(N — (n K ~ x — 1)) equal-rate 
scalar AWGN channels. The sum of the capacities of these 
channels tends to the capacity of the original channel for large 
values of N, where the only loss comes from edge effects 
(truncation of the extreme (n K — n) elements). 

Remark 1: For the case where the matrices have non-equal 
determinants, the K-GMD Theorem Q] results in K triangular 
matrices, each with a constant diagonal with entries that are 
equal to y/\ det (Ai)\. 

Remark 2: It was shown in ifTTl Lemma 1] that A'-GMD is 
equivalent to (A'+1)-JET. Hence, nearly-optimal (A+1)-JET 
can be obtained with the same parameters as in Theorem Q] 

We first present the tools used in the construction. We then 
demonstrate the construction for the special case of 2 x 2 
matrices K = 3 users and N = 4 augmentations. The general 
case (utilizing the same tools) is given, as Matlab and Python 
codes in fl2l. fOl. 



Definition 1: Let A and B be matrices of dimensions nxm 
and 2x2, respectively. We define the operation of "extraction" 
of indices i and j from A by: 



decompose the resulting matrices (after multiplying them by 
V' 1 )) according to the QR decomposition, resulting in unitary 

matrices ( U.- | such that: 



A [i, j] 



An 

A-a 



A, 



where [i, j] = (j,j)}. 

We further define the "embedding" 
In (Ui l m ii n i] ) as tne replacement of the 
in the identity matrix in the index-pairs 
in [mi, rii \ Lm 2 , n 2 ~\ lm 3 , n 3 ] , . . . H with the 
contained in B, where index overlap is forbidden, i.e., all the 
indices {rrii} U {rii} are unique. For example, the embedding 
(LI, 31 L2, 41) of 



operation 
elements 

contained 
elements 



J 4 



B = 



11 
3 



into the four-dimensional identity matrix I4 is 



( 11 
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Algorithm for n — 2, K — 3, N — 4: Denote by {At} 
the augmented matrices corresponding to N = 4 channel uses. 

Stage 1: Start by applying a 1-GMD for each block (corre- 
sponding to a single channel use) of the first matrix A\: 



(u^A 1 V^ 



1 * 
1 



which corresponds, in turn, to applying the following extended 
unitary matrices 



T 



1) = (ltf ) ) t A l V( 1 ) 
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Stage 2: In the second stage we apply the 1-GMD decom- 
position to the matrices T 2 (1) [2, 5] and T 2 (1) [4, 7]. In both 
cases the two-by-two matrices are of the same form: 



U, 



(2) 



"2 





v& = 



Now note that the matrix corresponding to these elements 
in have the identity matrix form Thus, on the 
right and (V^ 2 ))^ on the left result in the identity matrix: 



1 
1 



= (V< 



1 
1 



v {2 



For the third matrix, we apply the QR decomposition with 



U 



(2) 



(assuming no special structure). Define: 



t 



(1^ 



(LI, 2] L3, 41 L5, 61 L7, 8] 



V (1) = lY W [[1, 2} L3, 41 L5, 6] L7, 81) 
and results in the extended triangular matrix 









V 



\ 












Note that the same matrix has to be applied to 

all matrices (since the encoder is shared by all users). We 

6 By |i, j] [k, I] we denote [*, j] U L^, f|. 



(L2, 51 L4, 7]) 



V( 2 )4^ <2) ( L 2, 51 L4, 71) 



Thus, we attain the following matrices after the completion 
of the second stage: 
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T 3 (2) = (uf)V 3 (1 V 2 ) 
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where d\d 2 

Stage 3: Finally, apply the 1-GMD to T 3 (2) [4, 5]: 



K 1 )' 



d 2 
d x 



V(3) = 



1 



(3) 

Again, note that the corresponding sub-matrices of T± and 
are equal to I 2 . Hence, multiplying them by on the 
right and (W 3 * 1 )^ on the left, gives rise to the identity matrix 
I 2 . By defining 



V (3) AjV^ Q 4j 

we arrive at the following three triangular matrices: 



t 3 (3) = (i4 3) )V 3 (2 V 3 ) 
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T 2 (3) = (V 3 ))V 2 (2 V 3 ) 



T (3) = ^(3)j f Ti (2) V (3) 
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By taking the middle rows and columns (rows and columns 
4 and 5) we achieve the desired decomposition with diagonal 
elements equaling to 1 in all three triangular matrices simulta- 
neously]^ Formally we do so by defining the next matrix which 
is composed of rows 4 and 5 of the identity matrix Ig, 



and calculating 



1 
1 



t T .( 3 ) = 

Thus, the total matrices to be applied are 



1 * 
1 
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'\?( 1 )'\?( 2 )'\7(3)0 
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l[ (l) v (2) v (3) 
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U 2 1} U 2 2) V {3) 
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u 3 1) i4 2) i4 3) o 


VI. 


Discussion 



According to Theorem \T\ even when the number of users 
K is not very large, a large number of channel uses need 
to by combined and processed together in order to approach 
the capacity. We demonstrate this phenomenon by considering 
the example of the introduction (fTJ where the gains a and f3 
and the power constraint P satisfy ©, i.e., that the individual 
capacities are equal. 

For this case, the best achievable rate of the existing prac- 
tical schemes, described in the introduction, is single-stream 
beamforming, which is therefore taken to be the benchmark 
which we aim to improve. Nonetheless, this scheme does not 

7 Over the diagonal elements which equal 1, we transmit using SISO codes, 
in our proposed scheme, whereas we make no use of the remaining elements 
as they may take arbitrary values. 
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% Capacity 


33 


37 


50 


60 


67 


75 


80 


90 


Channel uses for GMD 


5 


5 


6 


8 


9 


12 


15 


30 


Channel uses for JET 


2 


2 


2 


3 


3 


4 


5 


10 



TABLE I: Number of channel uses, when using A-GMD and K- 
JET, processed together to achieve a given portion of the capacity. 
For P — > oo, time-sharing between the users achieves 33% of the 
capacity, and the benchmark — 50%; for C p2p = 10 [ cl J^iuJ > time " 
sharing achieves 37% of the capacity whereas the benchmark — 67%. 



utilize the two degrees of freedom offered by the first channel 
Hi . This becomes more significant in the high SNR regime, in 
which this benchmark rate achieves only half of the available 
(multicast) capacity ||3}. For the proposed scheme in this paper, 
we provide in Table [Qthe number of channel uses needed to be 
combined and processed together to achieve given portions of 
the capacity, where for comparison, we present in the table the 
benchmark rate and the rate achieved by time-sharing between 
the users, for the case of P ~ > oo and the case in which the 
individual capacities © equal to C p2p = 10 [ . ] 



For more users (larger K), the ratio between the benchmark 
rate and the capacity deteriorates rapidly as K grows large. 
However, the number of channel uses needed to achieve a 
certain percentage of the capacity, using the approach devel- 
oped in this paper, grows rapidly. Yet, based on numerical 
evidence, we believe that the number of required channel aug- 
mentations can be reduced. Furthermore, for special families 
of MIMO channels, very significant reduction is possible, as 
demonstrated in lfl4l . 
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Both, in the benchmark and the proposed scheme in this paper, we assume 
that the scalar codes used are capacity-achieving. 



